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62061.0105 

1 This application claims the benefit of the earlier filed US Provisional Pat. App. Ser. 

2 No. 60/165,204, filed 12 November 1999 (12.11.99), entitled "Method and Apparatus for 

3 Processor Verification" which is incorporated by reference for all purposes into this 

4 specification. 

5 BACKGROUND OF THE INVENTION 

6 Field of the Invention 

7 The present invention relates to the functional verification of integrated circuit hardware 

8 P designs. In particular, the present invention provides a method and apparatus that functionally 

9 ^ verifies processors capable of executing more than one instruction at the same architectural time. 

f-T~3 

10 S Description of the Related Art 

yi 

1 1 Random test generation programs have been around since the early days of 

12 y> microprocessor verification. These programs automate the complex job of creating large test 

13 ¥* suites necessary for the functional verification of computer systems. From a historic perspective, 

14 tJ functional verification has focused on two types of circuits, combinatorial and sequential. 
P 

15 Combinatorial circuits do not contain state information, and therefore the outputs are strictly a 

16 function of the input values. Sequential circuits contain memory elements that allow the circuit 

17 to have a current state. This makes the output values a function of both the current state and the 

18 input values. Early on, those in the industry recognized that creating purely random input values 

19 'was not an adequate method of verifying sequential circuits, and a new test methodology that 

20 included the current state of the circuit was developed. 

21 One method used to generate test patterns for processor verification that is well known in 
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1 the art is to use a computer to generate a machine executable program that is assembled from 

2 instructions, control flags, and data selected from specially prepared tables. To get the expected 

3 results, the automatically-assembled test program is simulated on an existing hardware or 

4 software golden model under driver control. The test operator can change the drivers, thus 

5 changing the conditions under which the test programs are executed. This approach to testing is 

6 generally characterized as static testing, because the test patterns are assembled first, and are 

7 then executed in the simulation to get the expected results. The test pattern development process 

8 is not influenced by the intermediate machine states of the processor during the execution of the 

9| test. 

SI 

lOiu. While static testing is highly useful, it does have drawbacks. One obvious drawback is 

llp that tests created without knowledge of the intermediate machine states have great difficulty 

5 

12 SI achieving high coverage of all the possible intermediate states. In addition, statically generated 

13H tests are restricted in that certain real-life cases can never be simulated. For example, some 

I4^f instructions within a statically generated instruction sequence must always be preceded by a 
specially-inserted data initialization instruction, which would ordinarily not appear in normal 

16 operation. Similarly, register usage can be somewhat more limited in a static simulation than it 

17 would ordinarily be in normal operation. Finally, statically generated simulations that include 

18 loops and branches must be very carefully (and therefore, artificially) constructed to avoid 

19 undesirable test results such as endless loops, branches that are never taken, and the like. 

20 To address some of these issues, and to create a test environment that is more efficient 

21 and conducts a more robust test of very complex sequential circuits such as processors, the 

22 industry has moved toward the use of dynamically generated, biased pseudo-random test 
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1 patterns. In dynamic testing, instructions are generated, all processor resources and facilities 

2 needed for executing the instruction are identified and initialized if required, the instruction is 

3 executed on a simulated processor, and the simulated processor state is updated to reflect the 

4 execution results. The process iterates, and each instruction generated at the beginning of each 

5 iteration is generated with knowledge of the processor state that resulted from the last step 

6 executed. Although there are differences in the details of how instructions are generated from 

7 test method to test method, in general, instructions are drawn from an instruction pool or tree 

8 specified by the test operator that contains the various types of instructions to be tested. In 

9 P addition, many pseudo-random dynamic test generation methodologies allow the test operator to 

%p 

i 

10-7 bias the instruction pool or tree, meaning that the operator can specify what percentage of the 

O 

1 1 « total instructions generated and simulated should be load-store instructions, what percentage 

yi 

12 %4 should be arithmetic instructions, what percentage should be floating point instructions, and so 

13 |M> forth. Those skilled in the art are generally familiar with the various current methods used to 

14 H conduct functional verification of complex circuits using dynamically generated biased psuedo- 

15 O random test patterns. Readers unfamiliar with these methods are referred to US Pat. No. 

O 

16 5,202,889 to Aharon et al., entitled "Dynamic Process For The Generation Of Biased Pseudo- 

17 Random Test Patterns For The Functional Verification Of Hardware Designs" and the patents 

18 and publications referenced therein, which is incorporated by reference for all purposes into this 

19 disclosure. U.S. Patent No. 5,202,889 to Aharon is a good example of the current state of the art 

20 of pseudo-random test pattern generators and their usefulness for functional verification of 

2 1 complex circuits . 

22 Dynamic testing conducted in the manner described by Aharon eliminates the 
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1 requirement for the cumbersome predevelopment of data tables and code streams that is 

2 necessary to achieve a high level of confidence in static testing. Also, because instructions can 

3 be automatically and iteratively generated and then simulated using an appropriately biased 

4 instruction pool, dynamic testing can also provide greater test coverage than static testing in a 

5 shorter amount of time. Nevertheless, dynamically generated tests using psuedo-random test 

6 patterns such as that described by Aharon have drawbacks and limitations as well. 

7 For example, generating stimuli based solely upon current machine state information may 

8 not adequately test some complex processors such as Very Long Instruction Word ("VLIW") 
9O and Reduced Instruction Set Computer ("RISC") processors, which may have special rules 

10|7 regarding usage of certain resources. Similarly, some processor architectures allow certain 

b 

1 1 q groups of instructions to execute at the same architectural time. Instructions that are dynamically 

m 

12 %J generated and simulated in a pseudo-random test pattern must therefore adhere to the processor's 

13 H 3 instruction grouping rules. This creates another case where the creation of input stimulus is not 

14 M; just a function of the current state, and therefore requires additional components in the test 

15 M generator to restrict the grouping of instructions and possible operands to a given set of rules. In 

P 

16 addition, while a dynamically generated test setup may run flawlessly on a single processor 

17 model, the same test may violate memory coherence rules when run on a multiprocessor model. 

18 Finally, the creation of highly deterministic code sequences, such as sequences that execute if 

19 then else statements and other conditional control breaks, can be problematic in dynamically 

20 generated tests. The present invention addresses the limitations of current-technology dynamic 

21 psuedo-random test pattern generators, which currently select and generate instructions based 

22 only upon current machine state. The present invention thus provides a better functional 

5 
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1 verification test of some processors, by taking into account machine state, resource availability, 

2 and instruction grouping rules in selecting and generating pseudo-random test stimuli. The 

3 present invention is a method and apparatus that generates tests that verify that a processor 

4 system under test properly executes two or more instructions issued and executed in parallel. 

5 The present invention generates tests by dynamically tracking selected system resources of a 

6 golden model and by selecting and scheduling instructions for simultaneous execution on the 

7 golden model based upon current and future system resource availability, current architectural 

8 state of the golden model, and the instruction grouping rules and other architecturally-related 
90 rules of the golden model and the processor system under test. The present invention simulates 
10 S ^ the parallel execution of valid groups instructions, and then updates the architectural state of the 

lip? golden model. The present invention outputs a test that includes groups of instructions 

in 

12CJ designated for parallel execution on the processor system under test, plus information regarding 

13^ the correct intermediate states of selected system resources during the test, which enables the 

14 1^ user to compare the test results obtained when the test is run on a processor system under test 

o 

15 O with results obtained when the test is run on a golden model. 

p 

16 
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1 SUMMARY 

2 The present invention is a method and apparatus that dynamically generates tests that can 

3 be used to verify that a processor system under test properly executes two or more instructions 

4 issued and executed in parallel. The present invention includes a user preference queue, a rules 

5 table, plurality of resource-related data structures, an instruction packer, and an instruction 

6 generator and simulator. The user preference queue has a number of entries, each of which 

7 comprises an instruction to be tested, a group or tree of instructions to be tested, or a test 

8 generator control command. The resource-related data structures comprise information such as 

9 0 actual and/or predicted past state, present state, and future state of selected system resources. In 

10 ^ this specification the term "system resources" is defined to include both architectural resources 

1 1 ^ and conceptual resources. Architectural resources include resources like the architectural state of 

m 

12 ^ the processor (as defined by the state of specific registers), other registers, the load/store buffer, 

E 

13 y* and the like. The term "conceptual resources" refers to resources that may not actually exist 

Ljl 

14 H within the architectural state of the machine under test, but instead represent a simple way to 

15 O represent complex interactions, such as whether or not a branch instruction is pending, or the 

16 usage of various units over time, or estimated bus traffic at a particular point in time. 

17 The instruction packer creates a very long instruction word that generally comprises a 

18 packet or bundle of two or more instructions valid for parallel execution by the processor under 

19 test. The instruction packer selects instructions from the entries in the user preference queue, 

20 based upon resource availability as indicated by the resource-related data structures and the 

21 architecturally-related instruction grouping rules appropriate for the golden model and the 

22 processor system under test. In one mode, the instruction packer selects a first instruction from 
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1 an instruction tree in the user preference queue. The instruction packer then selects a second 

2 instruction to be packed into one VLIW packet with the first instruction by eliminating 

3 instructions in the instruction tree that are ineligible for selection, either because those 

4 instructions requires a resource that a resource-related data structure indicates is unavailable, or 

5 because they require a resource that will be utilized by the first instruction selected and cannot be 

6 shared between instructions within a VLIW packet, or because they conflict with the instruction 

7 grouping rules appropriate for the golden model and the processor system under test. The 

8 instruction packer continues selecting instructions in this manner, by first defining potential 

9 O groups of instructions valid for selection and then selecting instructions from those groups, until 

10 the instruction packer has selected a group of N instructions, where N is the maximum number of 

1 1 g instructions that the golden model and the processor system under test can execute in parallel, or 

in 

12 %j until there are no more instructions in the instruction tree that can be validly selected. 

13 In another mode, the instruction packer selects instructions in order, from an ordered list 

Ljl 

14 N that the user provides and that is loaded into the user preference queue. In this mode, the 

O 

15 O instruction packer selects instructions for parallel execution in the order specified by the user, but 

16 stops packing instructions into a VLIW word if an instruction specified by the user cannot be 

17 executed in parallel with previously selected instructions, because of resource unavailability, 

18 conflicting resources with previously selected instructions, or violations of the instruction 

19 grouping rules. In this case, the instruction packer either packs "no operation" instructions into 

20 the remaining slots of the VLIW packet, or simply issues the VLIW word "as is" if the processor 

21 architecture allows VLIW packets of varying lengths and formats. 

22 The present invention marks instructions that have been packed into a VLIW packet to 
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indicate that the instructions should be executed in parallel. The instruction generator and 
simulator generates the instructions selected by the instruction packer, and then simulates the 
execution of the instructions on the golden model. The present invention updates the appropriate 
resource-related data structures that correspond to the system resources utilized by the 
instruction. The present invention evaluates the architectural state of the golden model and 
updates the appropriate resource-related data structures. 
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1 BRIEF DESCRIPTION OF THE DRAWINGS 

2 To further aid in understanding the invention, the attached drawings help illustrate 

3 specific features of the invention and the following is a brief description of the attached 

4 drawings: 

5 FIG. 1 shows a typical networked computer environment. 

6 FIG. 2 shows a block diagram of a hypothetical VLIW processor architecture and its 

7 corresponding VLIW instruction format. 

8 FIG. 3 shows the major components of the present invention and their high-level 

P 

9 yp interaction with each other. 

10 Mi FIGs. 4A and 4B combine to form a flowchart that shows the control and operational 

5 

1 1 □ flow of the present invention as it creates and conducts functional verification tests. 

Ul 
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1 DETAILED DESCRIPTION OF THE INVENTION 

2 The present invention is a method and apparatus that enables a processor developer to 

3 verify that a processor under test properly executes two or more instructions issued and executed 

4 in parallel. As described in detail herein, the present invention allows for a robust verification of 

5 the parallel execution capabilities of VLIW and other complex processor systems because the 

6 present invention generates a processor test while tracking both system resources and the 

7 architectural state of the processor golden model. The present invention selects instructions for 

8 generation based upon resource availability, processor state, and processor instruction grouping 

9 0 rules. The present invention packs those instructions into a valid very long instruction word and 

yp 

10 J; then simulates the execution of the VLIW by the processor golden model. The present invention 

Q 

1 1 q provides the processor developer with a test comprising a set of pseudo-randomly generated 

i 

12SJ instructions that include instruction packets appropriate for parallel execution, plus time-based 

13 H intermediate state information produced when those instructions are executed on a golden model, 

14 H that can then be run on the processor under test for comparison and verification. 

15 ^ This disclosure describes numerous specific details that include specific hardware and 
Q 

16 software structures and example instruction streams in order to provide a thorough understanding 

17 of the present invention. One skilled in the art will appreciate that one may practice the present 

18 invention without these specific details. In addition, other than the examples provided herein, 

19 details concerning the architecture and design rules associated with current advanced VLIW and 

20 other parallel issue and/or execution-capable processors are not provided in this disclosure to 

21 avoid obfuscation of the present invention. Those skilled in the art will understand that reference 

22 herein to processor implementation rules refers to the manufacturer's rules detailed in designers' 

11 
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1 reference guides supplied by processor manufacturers, such as Texas Instruments' Instruction Set 

2 Reference Guide for the TMS320C62xx CPU, Texas Instruments (July 1997), which is 

3 incorporated into this specification by reference for all purposes. 

4 The present invention is preferably practiced in the context of a standalone or networked 

5 personal computer test setup such as that depicted in FIG. 1. In FIG. 1, typical network 100 

6 includes one or more computer workstations 102 networked together via network connection 

7 104, which is controlled by network controller 106. The network 100 may also include various 

8 peripheral devices, such as network storage device 108 and one or more printers (not shown in 
9 P FIG. 1). Typical computer workstation 102 includes computer 112, internal storage media such 

L S 
~- jj 

10|J as RAM 118, external storage media such as a floppy disk 114, and various interface devices 

XSSS. 

U 

1 1 q such as a mouse 1 16, a monitor 120, and a keyboard 122. Although a particular computer may 

m 

12 only have some of the units illustrated in FIG. 1, or may have additional components not shown, 

13 H 5 most computers will include at least the units shown. 

14^3 In the embodiment described herein, the present invention is implemented in the C++ 

15^ programming language. C++ is a compiled language, that is, programs are written in a human- 

16 readable script and this script is then provided to another program called a compiler, which 

17 generates a machine-readable numeric code that can be loaded into, and directly executed by, a 

18 computer. Those skilled in the art are very familiar with C++, and many articles and texts are 

19 available which describe the language in detail. While the exemplary embodiment detailed 

20 herein is described using C++ code fragments and pseudo-code, those skilled in the art will 

21 understand that the concepts and techniques disclosed herein can be practiced by test designers 

22 and operators using any higher-order programming language, such as Java, without departing 

12 
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1 from the present invention. 

2 FIG. 2 is a block diagram showing a hypothetical VLIW processor 150 having two 

3 floating point units 154, 156, an integer arithmetic logic unit 158, a branch processor 160, and a 

4 load/store unit 162. VLIW processor 150 also includes an instruction issue unit 152 and a 

5 register file 164. FIG. 2 also shows a very long instruction word format 130 for hypothetical 

6 VLIW processor 150. As shown in FIG. 2, the instruction word format 130 includes fixed slots 

7 for five RISC-like instructions: a floating point addition instruction 132, a floating point 

8 multiplication instruction 134, an integer arithmetic instruction 136, a branch instruction 138, 
9^ and a load/store instruction 140. Those skilled in the art will recognize that FIG. 2 shows a 

lO^: highly simplified, hypothetical case only. For example, rather than having fixed "slots" and 

O 

1 1 p fixed-length instruction words, some VLIW processors are much more complex and may utilize 

Ul 

12 SJ instruction words of varying lengths, comprising one or more simple instructions that do not 
13H reside in fixed slots, but can be packed in any order. The intention of FIG. 2 is to demonstrate 

jjSSB! 

14 |f that a VLIW instruction comprises a group or packet of independent, RISC-like instructions that 

p 

15^ are issued in parallel to multiple independent execution units within the processor. The 

16 architecture of the processor thus dictates what kinds of instructions can be grouped together and 

17 executed in parallel. The hypothetical processor shown in FIG. 2 can execute simultaneously 

18 one floating point addition instruction, one floating point multiplication instruction, one integer 

19 arithmetic instruction, one branch instruction, and a load/store instruction, as long as the 

20 instructions all utilize different local destination registers. 

21 FIG. 3 is a high-level functional block diagram of the test generation apparatus of the 

22 present invention. As shown in FIG. 3, the present invention includes a user interface 202, an 

13 
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1 inputter 204, a user preference queue 206, an instruction packer 208, a rules table 209, a set of 

2 resource-related data structures 210, a template generator 212, one or more templates 214, a 

3 mode controller 220, an instruction generator and simulator 2 1 6, and an output generator 218. 

4 The user interface 202 allows users of the present invention to specify the contents, 

5 preferences, and control information required to create a functional verification test of a 

6 simulated processor or processor system. In a preferred embodiment, the user interface is a 

7 graphical, interactive user interface wherein the user can specify the types of instructions to be 

8 simulated by selecting from a menu of instruction choices. In addition, the user can specify test 
9 y generation preferences, such as whether instructions are to be generated using a biased 

10 ^ instruction tree or an ordered instruction list, for example, or whether iterative loops should be 

Q 

1 1 g inserted, how many instructions can be executed in parallel, whether certain instructions should 

m 

12 \| be executed in parallel with others, or whether the generator should insert and run macros 
13M> between instructions. Those skilled in the art are familiar with the types of instructions and 

14 M- typical generation controls necessary for the dynamic generation of biased pseudo-random test 

15 0 patterns for processor verification, and therefore further detail regarding instruction selection and 

16 generation preferences is not provided herein. In addition, those skilled in the art will recognize 

17 that a graphical user interface is optional; advanced users may provide inputs to the present 

18 invention using other input methods, such as creating and providing an input text file that 

19 contains the necessary instruction generation information. 

20 The user interface 202 also enables the user to specify test generation controls that 

21 control the generation mode and dictate the user's controls for specific instructions. For 

22 example, in a preferred embodiment, the user can assign specific registers to be source registers 

14 
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1 or destination registers. Alternatively, the user can assign a specific initial source or destination 

2 value for registers and memory locations. Users can specify specific addresses as target 

3 addresses for branch instructions or memory operations. Those skilled in the art will recognize 

4 that current dynamic pseudo-random test pattern generators generally provide similar instruction 

5 and test generation controls, and therefore, further detail regarding the capabilities of the present 

6 invention to provide the user with control flexibility and options regarding dynamically 

7 generated and executed instructions is not provided here. 

8 The inputter 204 accepts test content, preference, and control information from the user 
9^ via the user interface 202, and adds that information to the user preference queue 206 and the 

10 ^ mode controller 220. Each entry in the user preference queue 206 is an instruction, a group/tree 

Q 

lip of instructions, or a generator command. If the user has indicated that an ordered list of 

Ul 

12 Sj instructions is to be generated and simulated, then the inputter puts that list into the user 

13H preference queue in the order specified. If, on the other hand, the user has indicated that some 

14 number of instructions are to be generated randomly, then the inputter adds an entire tree of 

Q 

15 2 instructions, which may or may not be biased by the user. When the user preference queue 206 

16 shrinks below a threshold, additional elements are added by the inputter 204. 

17 The operation of the mode controller 220 is described in detail in copending patent 

18 application, U.S. Patent App. Ser. No. , filed 10.11.00 (10 November 2000) 

19 (docket number 62061.0103), now U.S. Pat. No. , entitled "Method and 

20 Apparatus for Static Test Pattern Generation Within a Dynamic Pseudo-random Test Program 

21 Generation Framework," (hereinafter, the "Static/Dynamic Generator Patent") which is 

22 incorporated by reference into this specification for all purposes. In addition, the Static/Dynamic 

15 
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1 Generator Patent also describes the use of templates 214 and the operation of the test generator in 

2 both sequential mode and dynamic mode, and details regarding templates and the test generator 

3 operating modes are not repeated herein. While the present invention makes use of templates as 

4 described in the Static/Dynamic Generator Patent, the special sequential operating mode of the 

5 test generator described in the Static/Dynamic Generator Patent does not apply to the present 

6 invention. For the purposes of this disclosure, the reader should assume that the test generator is 

7 always operating in dynamic mode. 

8 Returning to FIG. 3, information from the user preference queue 206, the resource-related 
9© data structures 210, the rules table 209, and existing templates 214 is combined in the instruction 

lOy^ packer 208 to create sequences of instructions or groups of parallel instructions for generation 

1 1 p and simulation. As explained in further detail below, the resource-related data structures 210 are 

m 

12%| data structures that are created, maintained, and updated to track the actual and/or predicted past 

13 H state, present state, and future state of selected system resources. In this specification the term 

| a jj„ 

14 ^ "system resources" is defined to include both architectural resources and conceptual resources. 

15 y Architectural resources include resources like the architectural state of the processor (as defined 

16 by the state of specific registers), other registers, the load/store buffer, and the like. The term 

17 "conceptual resources" refers to resources that may not actually exist within the architectural 

18 state of the machine under test, but instead represent a simple way to represent complex 

19 interactions, such as whether or not a branch instruction is pending, or the usage of various 

20 execution units over time, or estimated bus traffic at a particular point in time. Practitioners of 

21 the present invention may wish to adopt the tracking methodologies and approaches described in 

22 detail in copending patent application, U.S. Patent App. Ser. No. , filed 

16 
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1 10.11.00 (10 November 2000) (docket number 62061.0106), now U.S. Pat. No. 

2 ; , entitled "Method and Apparatus that Tracks Processor Resources in a 

3 Dynamic Pseudo-Random Test Program Generator" (hereinafter, the "Resource Tracking 

4 Patent") which is incorporated by reference into this specification for all purposes, to identify 

5 system resources appropriate for tracking and to track those resources. 

6 The instruction packer 208 uses the resource-related data structures to determine whether 

7 resources are currently available, or will be available, when they are needed by specific 

8 instructions. Instructions using resources that are either currently in use, or will be in use when 

9^ they are required are not eligible to be selected and scheduled by the instruction packer 208. 

w 

lOyT The rules table 209 comprises a table of instruction grouping rules, as dictated by the 

Q 

1 1 p architecture of the processor under test. For example, the rules table 209 for the hypothetical 

Lfj 

12 S| VLIW processor shown in FIG. 2 would indicate that two floating point addition instructions 

S 

13H= cannot be issued and executed in parallel. Likewise, two branch instructions, two integer 

14Hj arithmetic instructions, or two load/store instructions cannot be packed into the same VLIW 

15y group. In a preferred embodiment, the rules table 209 may also include other architecturally- 

16 related information useful to the instruction packer, such as the number of processor cycles each 

17 individual execution unit requires to execute an instruction, or whether the processor is capable 

18 of handling VLIWs of varying lengths, or whether the instruction issue unit can issue 

19 instructions to selected idle execution units while others are still operating on previously issued 

20 instructions, or whether there are restrictions on register usage by instructions within a single 

21 VLIW. Those skilled in the art are very familiar with these architecturally-dictated design rules 

22 published by processor manufacturers, and the need to adhere to those rules when developing 

17 
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1 new processor systems utilizing the architectural approaches and limitations of existing 

2 processors. 

3 In a preferred embodiment, the instruction packer 208 selects instructions from the user 

4 preference queue 206 and packs them into VLIW groups in two different ways: either randomly, 

5 using an instruction tree, or consecutively, according to the order specified by the user. Each 

6 instruction selection methodology is explained below. 

7 If the user has specified, through the user interface, that some number of instructions 

8 should be generated randomly, then the present invention will place an instruction tree into the 

9^ user preference queue 206. The instruction packer 208 selects a first instruction that can be 

-43 

10|? executed, based upon the current system state information and resource availability information 

b 

lip contained within the relevant resource-related data structures 210. The instruction packer 208 

m 

12SJ then selects a group of potentially valid instructions in the user preference queue that can be 

13H* selected next by eliminating all instructions that are ineligible for selection. For example, 

14N instructions within the instruction tree are ineligible for selection if they cannot be executed in 

? s=sl 

15^ parallel with the first instruction selected, as indicated by the rules table 209. Instructions are 

16 also ineligible for selection if their resource needs conflict with the first instruction selected, such 

17 as might occur if a potential instruction needs a destination register that the first selected 

18 instruction will be using. Those skilled in the art will understand that resource conflicts will be 

19 ordinarily be dictated by the processor architecture, just as a processor's instruction grouping 

20 rules are dictated by the architecture. For example, some architectures may allow two 

2 1 instructions within the same VLIW packet to utilize the same register if it is a source register, but 

22 will not allow two instructions to use the same destination register. Others may prohibit all 

18 
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1 forms of register sharing. Practitioners of the present invention will thus customize the resource 

2 conflicts rules implemented in a specific embodiment as required to reflect the architectural 

3 requirements and rules of the processor under test. 

4 After the instruction packer 208 eliminates all ineligible instructions from the instruction 

5 tree, the instruction packer 208 then selects a second instruction for execution in parallel with the 

6 first instruction. The instruction packer 208 continues in this manner, repeating the elimination- 

7 and-select process by eliminating instructions that utilize either a system resource that conflicts 

8 with any previously selected instruction, or that cannot be executed in parallel with any 
9¥ previously selected instruction. The selection process ends when either the maximum number of 

yo 

\0i2 instructions has been selected (five, in the case of the hypothetical VLIW processor shown in 

11 5 FIG. 2), or no more instructions from the instruction tree are eligible for selection. In the latter 

y i 

12 SJ case, if the VLIW group is not completely packed and the processor architecture requires all 

2 

13M 1 VLIW packets to be of a specific length and format, the instruction packer will fill the remaining 

S X 

14H; slots with "no operation" instructions. Alternatively, if the processor architecture allows VLIW 

i H 

packets of varying lengths, "no operation" instructions may not be required and the VLIW 

16 packet can continue through the process as is. 

17 When the user has specified that a specific listing of instructions are to be executed in 

18 parallel by providing an ordered instruction list to the present invention via the user interface, the 

19 instruction packer 208 follows that sequence, to the extent that the user's instruction sequence 

20 comprises a group of instructions that can be validly executed in parallel. In other words, the 

21 instruction packer 208 selects the user's first specified instruction from the user preference queue 

22 206 if system resources are available, then selects the user's second specified instruction if 

19 
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system resources are available and the user's second specified instruction can be validly grouped 
with the first instruction, according to the processor's instruction grouping rules. The instruction 
packer 208 then selects the user's third specified instruction, assuming it does not utilize 
resources that conflict with the first and second instructions and can be validly grouped with the 
first and second instructions, and so forth. When the instruction packer 208 encounters an 
instruction that is ineligible for execution, either because of conflicting resources or because of 
conflicting grouping rules, the instruction packer 208 stops selecting instructions for a given 
VLIW packet, and either issues the VLIW packet as is, or fills the remaining slots with "no 
operation" instructions if required by the processor architecture. When the first instruction on a 
user's ordered list cannot be selected because of conflicting resources, the instruction packer 
selects one or more "no operation" instructions until the required resource becomes free and the 
instruction can be validly selected and packed into a VLIW bundle. 

For example, assume that the user is testing a processor system having an architecture 
like the hypothetical VLIW processor shown in FIG. 2. The user has therefore specified that up 
to five instructions can be packed into one very long instruction word for simulation. Also 
assume, for this example, that the user has specified that 100 instructions are to be randomly 
generated for the test. 

The present invention places an instruction tree into the user preference queue 206. The 
instruction packer 208 checks the resource-related data structures 210, selects a first instruction 
based upon resource availability, and places the instruction into the proper position in the VLIW 
instruction format as defined by the processor's instruction rules. Assume, for this example, that 
the first instruction selected is a floating point addition instruction that will require operands A 
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1 and B and registers 1, 2, and 3. 

2 For simplicity, we will also assume for this example that the hypothetical VLIW 

3 processor does not allow any resource sharing. In this case, the instruction packer 208 then 

4 "marks out" all instructions within the tree that require operands A and B and registers 1, 2, and 

5 3, and additionally, all other instructions in the tree that must be executed by the processor's 

6 floating point addition execution unit. The instruction packer 208 then selects a second 

7 instruction from the remaining instructions in the tree. The second instruction must meet the 

8 same requirements as the floating point instruction selected as the first instruction: all system 

9^ resources required to execute the second instruction must be currently free, the second 

SJ 

lOy] instruction may not utilize the same system resources required by the instruction already 

i B=3 l 

HQ selected, and it must be an instruction that can be validly executed in parallel with the floating 

m 

12SJ point instruction. Returning to the hypothetical VLIW processor shown in FIG. 2, if the 

Dp 3 instruction packer selects, as the first instruction, a floating point addition instruction that will 

i - 

14^ require operands A and B and registers 1, 2, and 3, a valid second instruction might be a floating 

152; point multiplication instruction using operands D and E and registers 4, 5, and 6. The instruction 

16 packer would not select another floating point addition instruction, because that selection would 

17 violate the processor's instruction grouping rules. Similarly, the instruction packer would not 

18 select a load/store instruction or a branch instruction involving operands A or B, or any of 

19 registers 1, 2, or 3, because those system resources will be in use and tied up for some number of 

20 cycles by the floating point instruction selected as instruction number one. After a second 

21 instruction is selected, the instruction packer selects a third instruction, and so forth, until (in this 

22 example) either all five instruction slots are filled, or there are no remaining potentially valid 
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1 instructions remaining in the original instruction tree loaded into the user preference queue. 

2 After the instructions are selected and packed into the VLIW bundle by the instruction 

3 packer 208, the template generator 212 generates a template 214 corresponding to each 

4 instruction. In this specification, a template 214 is a data structure that contains an instruction, 

5 plus all preferences and data required to execute the instruction. The template for each 

6 instruction in a VLIW packet also indicates that it is to be executed in parallel with other 

7 instructions, in the manner specified by the processor's design rules. 

8 For example, some VLIW processors utilize a bit in the instruction that indicates that the 

9^ instruction is to be issued and executed in parallel with whatever instruction immediately follows 

SJ 

lOy, it. The instruction that follows may also have that bit set, indicating that the instruction that 

b 

1 lp follows it should also be issued and executed at the same time as the previous two instructions, 

Ul 

12M and so on. The processor "finds" the last instruction in each group designated for parallel issue 

13?* and execution because the last instruction does not have the parallel instruction bit set, or 

lCj alternatively, includes another indication that it is the last instruction in a parallel execution 

15^ group. 

16 Other VLIW processors may require an indication in the first instruction in a group 

17 designated for parallel execution that the next "N" instructions should be fetched, issued, and 

18 executed simultaneously. These parallel execution rules are defined by the processor architect, 

19 in the same manner as the instruction grouping rules discussed above. Those skilled in the art 

20 are familiar with the various techniques processor designers use to designate instructions for 

21 parallel execution, and practitioners of the present invention that are adapting the present 

22 invention for use with various VLIW architectures will understand how to build templates that 
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1 include the proper indications for parallel execution on a processor under test designed in 

2 accordance with a specific architecture. 

3 After templates 214 are created, the instructions corresponding to the templates are 

4 generated and then simulated. The instruction generator and simulator 216 converts each 

5 template 214 into an instruction for the golden model. In a preferred embodiment, this process 

6 involves translating the instruction into assembly language or binary, resolving all the 

7 instruction's subfields, and initializing any necessary resources to create a valid instruction. The 

8 instruction generator and simulator 216 then either calculates or predicts current and future 

9^ resource utilization as described in further detail below and updates the appropriate resource- 

Ik j 

10tT related data structure 210. The instruction generator and simulator 216 initiates a step function 

lip of the architecturally correct golden model and sends the outstanding instructions to the golden 

Us 

12 SI model, which simulates the parallel execution of the instructions and updates its state registers 

13M 1 accordingly. The instruction generator and simulator 216 assesses the architectural state of the 
golden model as a function of time (processor cycles) and updates the appropriate resource- 

P 

15"ff related data structures 210 in preparation for the next iteration of the instruction selection 

16 process, and for utilization by the output generator 218, as described below. 

17 The output generator 218 generates a machine readable output file containing the test 

18 program. The output test file contains the starting state for each component in the system, the 

19 commands executed as a function of time, and the intermediate and final states of the golden 

20 model as a function of time. Users can then run the output test file on a processor system under 

21 test, which provides the same test stimulus to the system under test as was provided to the golden 

22 mode. Users then compare the time-based intermediate and final states of the system under test 
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1 with the known correct time-based intermediate and final states of the golden model, to verify 

2 the functionality of the device or system under test. 

3 FIGs. 4A and 4B show a flowchart that illustrates the control and operational flow of the 

4 present invention as it creates functional verification tests. At 302, the test operator provides the 

5 test inputs (test content, control, and preference information as described above), which are 

6 added to the user preference queue at 306 as required by 304. At 308, the instruction packer 

7 interfaces with the user preference queue 306, the resource-related data structures represented in 

8 FIG. 4A at 312, 314, and 316, and the processor grouping rules table at 309. Using the FIG. 2 

9O VLIW processor as an example, at 308, the instruction packer selects five instructions to be 

yQ 

10 f packed together into a VLIW in accordance with the procedure described above. These five 

Q 

llg instructions are then sent in a group to the template generator, which at 310, creates a template 

Ul 

12 sj for each instruction in the VLIW that is marked for parallel execution in accordance with the 

13H* processor's architectural rules, as described above. Therefore, continuing with the hypothetical 

14 M VLIW processor as an example, the template generator might create the following group of 

Q 

15^ templates corresponding to the five possible instructions packed into the VLIW as follows: 



Template No. 


Program Counter 


Template contents 


1 


1000 


Floating point addition instruction and associated 
information, marked for parallel execution with following 
instruction 


2 


1001 


Floating point multiplication instruction and associated 
information, marked for parallel execution with the 
following instruction 


3 


1002 


Integer addition instruction and associated information, 
marked for parallel execution with the following instruction 


4 


1003 


Branch instruction and associated information, marked for 
parallel execution with the following instruction 


5 


1004 


Load/Store instruction and associated information, not 
marked for parallel execution 
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1 

2 After the group of templates is created at 310, the present invention jumps to the main 

3 loop of the test generator 320, shown in FIG. 4B, and checks the operational mode at 326. As 

4 the reader will recall, the test generator will always be in dynamic mode when implementing 

5 parallel scheduling and simulation of instructions as described herein. 

6 At 328, the instructions corresponding to all outstanding templates are generated at 328, 

7 resources are calculated 330, estimated 332, the appropriate resource-related data structures are 

8 updated at 3 12 and 3 14, and the group of instructions are then sent to the golden model at 334 for 
9O simulation. The present invention evaluates the state of the golden model and records that state, 

10:*: as a function of time, for the output file. The present invention also updates the appropriate 

b 

llg resource-related data structures that correspond to the golden model architectural state in 

12SJ preparation for the next instruction selection and packing step. 

13H> At the next check at 324, a template will not exist at the current program counter, and the 

14H* present invention will return to the instruction packer, which then selects a new group of 

: a 
w 

15^ instructions for parallel execution using the procedure described above, and the entire process 

16 repeats until the end of the test. 

17 As described above, the present invention is also capable of generating tests where the 

18 user can specify a specifically ordered group of instructions to be generated for parallel 

19 execution. In that case, the user provides an ordered list of instructions via the user interface, 

20 rather than an instruction tree. In this case, as described above, the instruction packer selects 

21 instructions from the list in the user preference queue in the order specified by the user, assuming 

22 that system resources are available and the instructions listed do not violate the processor's 
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1 instruction grouping rules. Groups of instructions are sent to the template generator, and groups 

2 of templates are created at 310 in the same manner as that described above. After a group of 

3 templates is created for a VLIW packed by the instruction packer, the present invention jumps to 

4 the main loop 320, and generates and simulates the instructions as described above. Following 

5 simulation of each VLIW packet, the present invention will return to the instruction packer at 

6 308, which will then pack the next VLIW group using the instructions in the user's list, picking 

7 up where it left off with the previous VLIW. The present invention will continue packing VLIW 

8 instructions and creating template sequences in this manner, as long as nonconflicting resources 
9O are available and as long as the processor's instruction grouping rules are not violated. If the 
10 s ^ present invention encounters an instruction on the user's list that cannot be selected due to (for 
1 1 2 example) unavailable system resources, in one embodiment, the instruction packer simply packs 

Iks? 

m 

I2vj one or more "no operation" instructions into VLIW groups until the resource becomes available 

13;U and the instruction can be packed, generated, and simulated. 
Resource tracking . 

15Q As described above, the present invention selects instructions appropriate for parallel 

O 

16 execution by, among other things, examining the available system resources and selecting only 

17 those instructions that can be executed because the system resources they require are or will be 

18 free at the appropriate time. After groups of instructions are generated at 328 in FIG. 4B, 

19 resource usage is reevaluated at 330 and 332, and the appropriate resource-related data structures 

20 are updated at 3 12 and 314. These updated resource-related data structures affect the selection of 

21 the next group of instructions that the instruction packer selects for the next VLIW packet to be 

22 generated and simulated. 
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1 The present invention tracks the current state of architectural resources, such as 

2 individual registers, the load/store buffer, the architectural state of the processor as defined by a 

3 specific subset of registers, and the like, as a function of time. The present invention also tracks 

4 the utilization of the various execution units within the processor, again as a function of time. 

5 For example, in an embodiment of the present invention customized to generate a test for a 

6 processor system designed according to Texas Instruments' TMS320C62xx architecture, floating 

7 point instructions require four processor cycles to complete. The present invention tracks the 

8 utilization of the floating point execution unit, and the instruction packer will only select a 

9 G floating point instruction when the floating point unit is free, as indicated by the relevant 
10/7 floating-point execution unit data structure. Furthermore, when a floating point instruction is 
1 1 » packed into a VLIW packet and the corresponding floating point instructions are generated by 

m 

I2^j the instruction generator, the present invention designates the floating point execution unit as 

13L£ unavailable for four processor cycles. Assuming a VLIW group is packed and issued during 

14N= each processor cycle, the instruction packer is thus barred from selecting another floating point 

15 Q instruction until four processor cycles have elapsed, at which point the floating point unit 

16 becomes available. Similarly, the destination register(s) that the floating point instruction uses 

17 are also unavailable for four processor cycles, after which they will be updated to reflect the 

18 values determined by the execution of the floating point instruction. The instruction packer is 

19 thus barred from selecting another instruction that requires the same destination registers used by 

20 the floating point instruction until four processor cycles have elapsed. Those skilled in the art 

21 will recognize that, while neither the floating point execution unit utilization nor the values in or 

22 utilization of specific general purpose registers is reflected in the defined architectural state 
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1 within the golden model, tracking these system resources to aid in the efficient scheduling of test 

2 instructions is highly useful. 

3 Practitioners of the present invention that are customizing the present invention for a 

4 specific processor architecture can glean the information required to track relevant system 

5 resources from the programmer's manual provided by the processor's manufacturer. After 

6 reading this specification or practicing the present invention, practitioners of the present 

7 invention will understand that there are a number of different design approaches that will each 

8 accomplish the resource tracking described herein. For example, practitioners of the present 
9^ invention might include a resource tracker at 330 that examines the type of instruction generated 

l0 il ( e -S-> a floating P°i nt instruction or a branch instruction), turns to a table to determine the number 

lip of processor cycles to mark the associated execution unit as <c unavailable" (e.g., 4 cycles, in the 

U1 

\2sA case of the floating point execution unit or 2 cycles, in the case of the branch unit), and updates 

13H the data structure associated with the relevant execution unit to reflect that the execution unit is 

M- 

14^ unavailable for the appropriate number of cycles. Similarly, the same tracker might use the same 

0 

15S table to mark the registers and other resources associated with the instruction (identified on the 

16 instruction's template) as unavailable during the appropriate number of processor cycles. 

17 Practitioners of the present invention might choose to incorporate two pieces of information into 

18 the data structure associated with registers: the availability of the register, and the value 

19 currently loaded into the register. Alternatively, practitioners might choose to utilize separate 

20 data structures for individual register availability and individual register value. 

21 When tracking the future state of selected system resources (i.e., unavailability for a 

22 specific number of processor cycles), those skilled in the art will recognize that the tracker must 
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1 also update the data structures associated with the system resources at the appropriate time to 

2 reflect that the resource is now available again, thus indicating to the instruction packer that an 

3 instruction that requires the newly-freed resource can now be properly selected and packed into a 

4 new VLIW. 

5 Unlike the execution unit utilization or specific register utilization or states, some system 

6 resources that would be useful to track to maintain efficiency in scheduling instructions cannot 

7 be easily tracked. For example, processors typically execute loads and stores from the load/store 

8 buffer on a non-interrupt, background basis. Consequently, it is difficult to determine the precise 
9Q contents of the load/store buffer at a specific point in time, because it is difficult to determine at a 

10 s "; specific instance whether pending loads and stores have been written to and from memory. 

11*~ Nevertheless, those skilled in the art can construct a heuristic model that represents a prediction 

1 n 

<k? E 

12sj of traffic in the load/store buffer as a function of the number and timing of load/store instructions 

13M° generated in a specific test. When the predicted traffic through the load/store buffer reaches a 

Lis 

14N 3 certain predetermined threshold, the present invention can update a data structure that 

15O corresponds to the load/store buffer to designate the load/store buffer as unavailable. This will 

16 prevent the instruction packer from selecting further load/store instructions for packing into 

17 further VLIW groups until the load/store traffic drops below the specified threshold, at which 

18 time the tracker redesignates the load/store buffer as available. Alternatively, the user may want 

19 to specifically test a processor's ability to function properly when the load/store buffer is highly 

20 stressed by dense traffic to and from memory. In this case, the data structure corresponding to 

21 the load/store buffer would not be marked as unavailable, and the instruction packer would 

22 continue to select and pack load/store instructions, but the user would want to continue to 
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monitor the traffic density in the load/store buffer over time. 

System resources that practitioners of the present invention may find useful to track, 
either directly as described in the execution unit example, or by predictive methods as described 
in the load/store buffer example, include the utilization of various execution units within the 
system, state and utilization of general or special purpose registers, subfields of general or 
special purpose registers, instruction and data caches, system bus status, physical memory, 
virtual memory system page tables, and resources capable of generating system exceptions and 
interrupts. The reader is referred to the Resource Tracking Patent for details on resource tracking 
methodologies that can be employed to generate resource-related data structures appropriate for 
use with the present invention. 

To summarize, the present invention is a method and apparatus that generates a test that 
enables the verification that a processor system under test properly executes two or more 
instructions issued and executed in parallel. The present invention includes a user preference 
queue, a rules table, plurality of resource-related data structures, an instruction packer, and an 
instruction generator and simulator. The user preference queue has a number of entries, each of 
which comprises an instruction to be tested, a group or tree of instructions to be tested, or a test 
generator control command. The resource-related data structures comprise information such as 
actual and/or predicted past state, present state, and future state of selected system resources. 

The instruction packer creates a very long instruction word that can comprise two or 
more instructions valid for parallel execution by the processor under test. The instruction packer 
selects instructions from the entries in the user preference queue, based upon resource 
availability as indicated by the resource-related data structures and the architecturally-related 
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1 instruction grouping rules appropriate for the golden model and the processor system under test. 

2 The instruction packer also insures that instructions within a single VLIW do not require 

3 conflicting system resources. 

4 The present invention marks instructions that have been packed into a VLIW to indicate 

5 that the instructions should be executed in parallel. The instruction generator and simulator 

6 generates the instructions that correspond to each instruction selected by the instruction packer, 

7 and then simulates the execution of the instructions on the golden model. The present invention 

8 updates the appropriate resource-related data structures that correspond to the system resources 
9O utilized by the instruction. The present invention also evaluates the architectural state of the 

l(>rf golden model as a function of time and updates the appropriate resource-related data structures. 

Q 

llg Other embodiments of the invention will be apparent to those skilled in the art after 

12sj considering this specification or practicing the disclosed invention. The specification and 

13M* examples above are exemplary only, with the true scope of the invention being indicated by the 

14M 3 following claims. 

U 
15^ 
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